Automatic recognition of pathological phoneme production.
نویسندگان
چکیده
OBJECTIVE Proper diagnosis and therapy of pathological pronunciation of phonemes play an important role in modern logopedics. To enhance the efficiency of diagnosis and therapy an automatic recognition of pathological phoneme pronunciation is addressed in this paper. The authors focus on the therapy of phoneme substitution disorders. PATIENTS AND METHODS Recognized speech samples come from speech-impaired Polish children and partially from persons imitating speech disorders. Recognized speech disorders were substitutions in pairs (for the correct phonetic charactors please see online article) embedded in Polish carrier words. In order to detect substitutions in the recognized words, recently proposed human factor cepstral coefficients (HFCC) have been implemented. Efficiency of the HFCC approach was compared to the application of standard mel-frequency cepstral coefficients (MFCC) as a feature vector. Both dynamic time warping (DTW), working on whole words or embedded phoneme patterns, and hidden Markov models (HMM) were used as classifiers. The HMM classifier was based on whole-word models as well as phoneme models. Results present a comparative analysis of DTW and HMM methods. CONCLUSIONS The superiority of HFCC features over those of MFCC was demonstrated. Results obtained by DTW methods, mainly by modified phoneme-based DTW classifier, were slightly better in comparison with the HMM classifier. Results obtained for the detection of substitution in pairs (for the correct phonetic charactors please see online article) are very promising. The methods developed for these cases can be integrated into computer systems for speech therapy. For substitutions in pairs (for the correct phonetic charactors please see online article) further research is necessary.
منابع مشابه
Allophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کاملDesigning and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods
For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...
متن کاملComparison of HMM and DTW methods in automatic recognition of pathological phoneme pronunciation
In the paper recently proposed Human Factor Cepstral Coefficients (HFCC) are used to automatic recognition of pathological phoneme pronunciation in speech of impaired children and efficiency of this approach is compared to application of the standard Mel-Frequency Cepstral Coefficients (MFCC) as a feature vector. Both dynamic time warping (DTW), working on whole words or embedded phoneme patter...
متن کاملImproving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
متن کاملبهبود عملکرد سیستم بازشناسی گفتار پیوسته بوسیله ویژگیهای استخراج شده از مانیفولدهای گفتاری در فضای بازسازی شده فاز
The design for new feature extraction methods out of the speech signal and combination of their obtained information is one of the most effective approaches to improve the performance of automatic speech recognition (ASR) system. Recent researches have been shown that the speech signal contains nonlinear and chaotic properties, but the effects of these properties are not used in the continuous ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics
دوره 60 6 شماره
صفحات -
تاریخ انتشار 2008